22 research outputs found
Explainable Reinforcement Learning via a Causal World Model
Generating explanations for reinforcement learning (RL) is challenging as
actions may produce long-term effects on the future. In this paper, we develop
a novel framework for explainable RL by learning a causal world model without
prior knowledge of the causal structure of the environment. The model captures
the influence of actions, allowing us to interpret the long-term effects of
actions through causal chains, which present how actions influence
environmental variables and finally lead to rewards. Different from most
explanatory models which suffer from low accuracy, our model remains accurate
while improving explainability, making it applicable in model-based learning.
As a result, we demonstrate that our causal model can serve as the bridge
between explainability and learning.Comment: Accepted by IJCAI 202
Learning to Collaborate by Grouping: a Consensus-oriented Strategy for Multi-agent Reinforcement Learning
Multi-agent systems require effective coordination between groups and
individuals to achieve common goals. However, current multi-agent reinforcement
learning (MARL) methods primarily focus on improving individual policies and do
not adequately address group-level policies, which leads to weak cooperation.
To address this issue, we propose a novel Consensus-oriented Strategy (CoS)
that emphasizes group and individual policies simultaneously. Specifically, CoS
comprises two main components: (a) the vector quantized group consensus module,
which extracts discrete latent embeddings that represent the stable and
discriminative group consensus, and (b) the group consensus-oriented strategy,
which integrates the group policy using a hypernet and the individual policies
using the group consensus, thereby promoting coordination at both the group and
individual levels. Through empirical experiments on cooperative navigation
tasks with both discrete and continuous spaces, as well as Google research
football, we demonstrate that CoS outperforms state-of-the-art MARL algorithms
and achieves better collaboration, thus providing a promising solution for
achieving effective coordination in multi-agent systems
An End-to-End Task Allocation Framework for Autonomous Mobile Systems
This work aims to unravel the problem of task allocation and planning for multi-agent systems with a particular interest in promoting adaptability. We proposed a novel end-to-end task allocation framework employing reinforcement learning methods to replace the handcrafted heuristics used in previous works. The proposed framework achieves high adaptability and also explores more competitive results. Learning experiences from the feedback help to reach the advantages. The systematic objectives are adjustable and responsive to the reward design intuitively. The framework is validated in a set of tests with various parameter settings, where adaptability and performance are demonstrated
Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs
Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising
paradigm to address the exploration-exploitation dilemma in reinforcement
learning. It decomposes the source task into subgoal conditional subtasks and
conducts exploration and exploitation in the subgoal space. The effectiveness
of GCHRL heavily relies on subgoal representation functions and subgoal
selection strategy. However, existing works often overlook the temporal
coherence in GCHRL when learning latent subgoal representations and lack an
efficient subgoal selection strategy that balances exploration and
exploitation. This paper proposes HIerarchical reinforcement learning via
dynamically building Latent Landmark graphs (HILL) to overcome these
limitations. HILL learns latent subgoal representations that satisfy temporal
coherence using a contrastive representation learning objective. Based on these
representations, HILL dynamically builds latent landmark graphs and employs a
novelty measure on nodes and a utility measure on edges. Finally, HILL develops
a subgoal selection strategy that balances exploration and exploitation by
jointly considering both measures. Experimental results demonstrate that HILL
outperforms state-of-the-art baselines on continuous control tasks with sparse
rewards in sample efficiency and asymptotic performance. Our code is available
at https://github.com/papercode2022/HILL.Comment: Accepted by the conference of International Joint Conference on
Neural Networks (IJCNN) 202
Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and Text-to-Function -- with Real Applications in Traffic Domain
The previous state-of-the-art (SOTA) method achieved a remarkable execution
accuracy on the Spider dataset, which is one of the largest and most diverse
datasets in the Text-to-SQL domain. However, during our reproduction of the
business dataset, we observed a significant drop in performance. We examined
the differences in dataset complexity, as well as the clarity of questions'
intentions, and assessed how those differences could impact the performance of
prompting methods. Subsequently, We develop a more adaptable and more general
prompting method, involving mainly query rewriting and SQL boosting, which
respectively transform vague information into exact and precise information and
enhance the SQL itself by incorporating execution feedback and the query
results from the database content. In order to prevent information gaps, we
include the comments, value types, and value samples for columns as part of the
database description in the prompt. Our experiments with Large Language Models
(LLMs) illustrate the significant performance improvement on the business
dataset and prove the substantial potential of our method. In terms of
execution accuracy on the business dataset, the SOTA method scored 21.05, while
our approach scored 65.79. As a result, our approach achieved a notable
performance improvement even when using a less capable pre-trained language
model. Last but not least, we also explore the Text-to-Python and
Text-to-Function options, and we deeply analyze the pros and cons among them,
offering valuable insights to the community
Learning Multi-Agent Action Coordination via Electing First-Move Agent
Learning to coordinate actions among agents is essential in complicated multi-agent systems. Prior works are constrained mainly by the assumption that all agents act simultaneously, and asynchronous action coordination between agents is rarely considered. This paper introduces a bi-level multi-agent decision hierarchy for coordinated behavior planning. We propose a novel election mechanism in which we adopt a graph convolutional network to model the interaction among agents and elect a first-move agent for asynchronous guidance. We also propose a dynamically weighted mixing network to effectively reduce the misestimation of the value function during training. This work is the first to explicitly model the asynchronous multi-agent action coordination, and this explicitness enables to choose the optimal first-move agent. The results on Cooperative Navigation and Google Football demonstrate that the proposed algorithm can achieve superior performance in cooperative environments. Our code is available at https://github.com/Amanda-1997/EFA-DWM
Mixture of personality improved Spiking actor network for efficient multi-agent cooperation
Adaptive human-agent and agent-agent cooperation are becoming more and more
critical in the research area of multi-agent reinforcement learning (MARL),
where remarked progress has been made with the help of deep neural networks.
However, many established algorithms can only perform well during the learning
paradigm but exhibit poor generalization during cooperation with other unseen
partners. The personality theory in cognitive psychology describes that humans
can well handle the above cooperation challenge by predicting others'
personalities first and then their complex actions. Inspired by this two-step
psychology theory, we propose a biologically plausible mixture of personality
(MoP) improved spiking actor network (SAN), whereby a determinantal point
process is used to simulate the complex formation and integration of different
types of personality in MoP, and dynamic and spiking neurons are incorporated
into the SAN for the efficient reinforcement learning. The benchmark Overcooked
task, containing a strong requirement for cooperative cooking, is selected to
test the proposed MoP-SAN. The experimental results show that the MoP-SAN can
achieve both high performances during not only the learning paradigm but also
the generalization test (i.e., cooperation with other unseen agents) paradigm
where most counterpart deep actor networks failed. Necessary ablation
experiments and visualization analyses were conducted to explain why MoP and
SAN are effective in multi-agent reinforcement learning scenarios while DNN
performs poorly in the generalization test.Comment: 20 pages, 7 figure
GCS:Graph-Based Coordination Strategy for Multi-Agent Reinforcement Learning
Many real-world scenarios involve a team of agents that have to coordinate
their policies to achieve a shared goal. Previous studies mainly focus on
decentralized control to maximize a common reward and barely consider the
coordination among control policies, which is critical in dynamic and
complicated environments. In this work, we propose factorizing the joint team
policy into a graph generator and graph-based coordinated policy to enable
coordinated behaviours among agents. The graph generator adopts an
encoder-decoder framework that outputs directed acyclic graphs (DAGs) to
capture the underlying dynamic decision structure. We also apply the
DAGness-constrained and DAG depth-constrained optimization in the graph
generator to balance efficiency and performance. The graph-based coordinated
policy exploits the generated decision structure. The graph generator and
coordinated policy are trained simultaneously to maximize the discounted
return. Empirical evaluations on Collaborative Gaussian Squeeze, Cooperative
Navigation, and Google Research Football demonstrate the superiority of the
proposed method.Comment: Accepted by AAMAS202